The FMA instruction set is the name of a future extension to the 128-bit SIMD instructions in the X86 microprocessor instruction set to perform fused multiply–add (FMA) operations.[1] Two different variants of FMA instruction sets will be used:
Contents |
The FMA3 and FMA4 instruction sets have almost identical functionality but are not mutually compatible. Both contain fused multiply–add (FMA) instructions for floating point scalar and SIMD operations. It will take some time for compilers to find mechanisms to cope with the differences and optimize code accordingly.
The difference between FMA3 and FMA4 concerns the issue of whether the instruction can have three or four different operands. The FMA operation has the form:
The 4-operand form (FMA4) allows a, b, c and d to be four different registers, while the 3-operand form (FMA3) requires that d is the same register as either a, b or c. The 3-operand form makes the code shorter and the hardware implementation slightly simpler while the 4-operand form provides more programming flexibility.
See XOP instruction set for more discussion of compatibility issues between Intel and AMD.
Mnemonic (AT&T) | Operands | Operation |
---|---|---|
VFMADD132PDy | ymm, ymm, ymm/m256 | $0 = $0×$2 + $1 |
VFMADD132PSy | ||
VFMADD132PDx | xmm, xmm, xmm/m128 | |
VFMADD132PSx | ||
VFMADD132SD | xmm, xmm, xmm/m64 | |
VFMADD132SS | xmm, xmm, xmm/m32 | |
VFMADD213PDy | ymm, ymm, ymm/m256 | $0 = $1×$0 + $2 |
VFMADD213PSy | ||
VFMADD213PDx | xmm, xmm, xmm/m128 | |
VFMADD213PSx | ||
VFMADD213SD | xmm, xmm, xmm/m64 | |
VFMADD213SS | xmm, xmm, xmm/m32 | |
VFMADD231PDy | ymm, ymm, ymm/m256 | $0 = $1×$2 + $0 |
VFMADD231PSy | ||
VFMADD231PDx | xmm, xmm, xmm/m128 | |
VFMADD231PSx | ||
VFMADD231SD | xmm, xmm, xmm/m64 | |
VFMADD231SS | xmm, xmm, xmm/m32 |
Mnemonic (AT&T) | Operands | Operation |
---|---|---|
VFMADDPDx | xmm, xmm, xmm/m128, xmm/m128 | $0 = $1×$2 + $3 |
VFMADDPDy | ymm, ymm, ymm/m256, ymm/m256 | |
VFMADDPSx | xmm, xmm, xmm/m128, xmm/m128 | |
VFMADDPSy | ymm, ymm, ymm/m256, ymm/m256 | |
VFMADDSD | xmm, xmm, xmm/m64, xmm/m64 | |
VFMADDSS | xmm, xmm, xmm/m32, xmm/m32 |
The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:
It is currently uncertain whether the 3-operand VEX coded form (here called FMA3) or the 4-operand form (FMA4) will be the dominating standard in the future. It is also possible that future processors will support both forms.
|